Skip to content

Conversation

@petrochenkov
Copy link
Contributor

When a declarative macro expands and produces tokens it marks their spans as coming from that specific macro expansion.
Marking is a relatively expensive operation - it needs to lock the global hygiene data.

Right now marking happens lazily, when a token is actually produced into the output.
But that means marking happens 100 times if $($var)* expands to a sequence of length 100 (span of $var is marked and outputted as a part of the resulting nonterminal token).

In this PR I'm trying to perform this marking eagerly and once.

  • Pros (perf): tokens from sequences are marked once (1 time instead of N).
  • Cons (perf): tokens that never end up in the output are still marked (1 time instead of 0).
  • Cons (perf): cloning of the used macro arm's right hand side is required (src in fn transcribe).
  • Cons (perf): metavariable tokens of the tt kind weren't previously marked but they are marked now (can't tell whether the variable is tt this early). However, for macro_rules: Preserve all metavariable spans in a global side table #119673 we'll need tt metavars marked anyway.
  • Pros (diagnostics): Some erroneous tokens are now correctly reported as coming from a macro expansion.

@rustbot
Copy link
Collaborator

rustbot commented Jan 7, 2024

r? @cjgillot

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 7, 2024
@petrochenkov
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 7, 2024
@petrochenkov petrochenkov removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 7, 2024
@bors
Copy link
Collaborator

bors commented Jan 7, 2024

⌛ Trying commit ee50351 with merge 52e9fd6...

bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 7, 2024
macro_rules: Eagerly mark spans of produced tokens

When a declarative macro expands and produces tokens it marks their spans as coming from that specific macro expansion.
Marking is a relatively expensive operation - it needs to lock the global hygiene data.

Right now marking happens lazily, when a token is actually produced into the output.
But that means marking happens 100 times if `$($var)*` expands to a sequence of length 100 (span of `$var` is marked and outputted as a part of the resulting nonterminal token).

In this PR I'm trying to perform this marking eagerly and once.
- Pros (perf): tokens from sequences are marked once (1 time instead of N).
- Cons (perf): tokens that never end up in the output are still marked (1 time instead of 0).
- Cons (perf): cloning of the used macro arm's right hand side is required (`src` in `fn transcribe`).
- Cons (perf): metavariable tokens of the `tt` kind weren't previously marked but they are marked now (can't tell whether the variable is `tt` this early). However, for rust-lang#119673 we'll need `tt` metavars marked anyway.
- Pros (diagnostics): Some erroneous tokens are now correctly reported as coming from a macro expansion.
@bors
Copy link
Collaborator

bors commented Jan 7, 2024

☀️ Try build successful - checks-actions
Build commit: 52e9fd6 (52e9fd61249ec48ea734e1e0c4b7adb3299e703d)

@rust-timer

This comment has been minimized.

bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 7, 2024
macro_rules: Add an expansion-local cache to span marker

Most tokens in a macro body typically have the same syntax context.
So the cache should usually be hit.

This change can either be combined with rust-lang#119689, or serve as its alternative, depending on perf results.
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (52e9fd6): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.6% [0.2%, 1.6%] 42
Regressions ❌
(secondary)
0.6% [0.4%, 1.0%] 9
Improvements ✅
(primary)
-0.5% [-0.6%, -0.3%] 9
Improvements ✅
(secondary)
-2.0% [-2.8%, -1.4%] 4
All ❌✅ (primary) 0.4% [-0.6%, 1.6%] 51

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.1% [2.1%, 2.1%] 1
Regressions ❌
(secondary)
4.7% [1.3%, 7.7%] 4
Improvements ✅
(primary)
-2.1% [-2.1%, -2.1%] 1
Improvements ✅
(secondary)
-2.7% [-3.7%, -1.7%] 2
All ❌✅ (primary) 0.0% [-2.1%, 2.1%] 2

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.5% [1.1%, 1.8%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.0% [-3.6%, -2.4%] 2
All ❌✅ (primary) 1.5% [1.1%, 1.8%] 2

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 668.625s -> 666.421s (-0.33%)
Artifact size: 308.39 MiB -> 308.40 MiB (0.00%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jan 7, 2024
@petrochenkov
Copy link
Contributor Author

Need to try this again after #119693.
@rustbot blocked

@rustbot rustbot added the S-blocked Status: Blocked on something else such as an RFC or other implementation work. label Jan 7, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 8, 2024
macro_rules: Add an expansion-local cache to span marker

Most tokens in a macro body typically have the same syntax context.
So the cache should usually be hit.

This change can either be combined with rust-lang#119689, or serve as its alternative, depending on perf results.
@petrochenkov
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 8, 2024
@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-blocked Status: Blocked on something else such as an RFC or other implementation work. labels Jan 8, 2024
@bors
Copy link
Collaborator

bors commented Jan 8, 2024

⌛ Trying commit 24f88f8 with merge c87e2bb...

bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 8, 2024
macro_rules: Eagerly mark spans of produced tokens

When a declarative macro expands and produces tokens it marks their spans as coming from that specific macro expansion.
Marking is a relatively expensive operation - it needs to lock the global hygiene data.

Right now marking happens lazily, when a token is actually produced into the output.
But that means marking happens 100 times if `$($var)*` expands to a sequence of length 100 (span of `$var` is marked and outputted as a part of the resulting nonterminal token).

In this PR I'm trying to perform this marking eagerly and once.
- Pros (perf): tokens from sequences are marked once (1 time instead of N).
- Cons (perf): tokens that never end up in the output are still marked (1 time instead of 0).
- Cons (perf): cloning of the used macro arm's right hand side is required (`src` in `fn transcribe`).
- Cons (perf): metavariable tokens of the `tt` kind weren't previously marked but they are marked now (can't tell whether the variable is `tt` this early). However, for rust-lang#119673 we'll need `tt` metavars marked anyway.
- Pros (diagnostics): Some erroneous tokens are now correctly reported as coming from a macro expansion.
@bors
Copy link
Collaborator

bors commented Jan 8, 2024

☀️ Try build successful - checks-actions
Build commit: c87e2bb (c87e2bb366dc9a97ccb828c2235eb3a1a53931e3)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (c87e2bb): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.6% [0.3%, 2.0%] 53
Regressions ❌
(secondary)
0.6% [0.3%, 1.2%] 10
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.7% [-0.9%, -0.5%] 4
All ❌✅ (primary) 0.6% [0.3%, 2.0%] 53

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.1% [1.1%, 1.1%] 1
Regressions ❌
(secondary)
4.2% [0.8%, 7.4%] 4
Improvements ✅
(primary)
-0.8% [-0.9%, -0.7%] 2
Improvements ✅
(secondary)
-1.3% [-1.3%, -1.3%] 2
All ❌✅ (primary) -0.1% [-0.9%, 1.1%] 3

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.5% [1.1%, 2.0%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.5% [-0.5%, -0.4%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.5% [-0.5%, 2.0%] 4

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 667.831s -> 670.809s (0.45%)
Artifact size: 308.39 MiB -> 308.52 MiB (0.04%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 8, 2024
@petrochenkov
Copy link
Contributor Author

After #119693 it's still useful for huge sequences like in deep-vector, but otherwise the one time cost wins.

@petrochenkov petrochenkov deleted the markeager branch February 22, 2025 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

perf-regression Performance regression. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants